High expression of E2F transcription factors 7: An independent predictor of poor prognosis in patients with lung adenocarcinoma

Adenocarcinoma is the most common pathological type of lung cancer. The E2F7 transcription factor has been confirmed to be related to the occurrence and development of a variety of solid tumors, but the relationship with the prognosis of lung cancer is still unclear. Therefore, we conducted this study to explore the prognostic value of E2F7 for lung adenocarcinoma (LUAD) patients. In this study, we analyzed samples from the Cancer Genome Atlas (TCGA) to study the correlation between the expression of E2F7 and clinical features, the difference in expression between tumors and normal tissues, the prognostic and diagnostic value, and Enrichment analysis of related genes. All statistical analysis uses R statistical software (version 3.6.3). The result shows that the expression level of E2F7 in LUAD was significantly higher than that of normal lung tissue (P = 1e-34). High expression of E2F7 was significantly correlated with gender (P = .034), pathologic stage (P = .046) and M stage (P = .025). Multivariate Cox analysis confirmed that E2F7 is an independent risk factor for OS in LUAD patients (P = .027). Genes related to cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase showed differential enrichment in the E2F7 high expression group. In short, high expression of E2F7 is an independent risk factor for OS in LUAD patients and has a high diagnostic value.


Introduction
Lung cancer is the second most common malignant tumor in the world. In 2020, there will be approximately 2.2 million new cases of lung cancer worldwide. Lung cancer has become the leading cause of cancer deaths, accounting for about 18% of the total cancer deaths (about 1.8 million cases). [1] Among them, lung adenocarcinoma is the most common pathological type. [2] Early lung cancer has no obvious symptoms, so most patients are already in the advanced stage when they are diagnosed, which leads to a generally low survival rate of lung cancer patients.
The E2F transcription factor family play a key role in the occurrence and development of tumors due to its important cell functions related to cell cycle regulation and apoptosis. [3] As a newly discovered member of the E2F family in recent years, unlike other family members, E2F7 has two special DNA-binding domains (DBD) in structure, lacks the binding domain to the RB protein, and does not need to bind to dimerizing proteins to enter the nucleus. [4,5] E2F7 is a priming factor involved in cell cycle regulation, apoptosis and differentiation, involved in the late stage of mitosis, embryonic development, DNA stress response, and is likely to participate in the occurrence of tumors. [6][7][8][9] As an epithelial transcription inhibitor, amplification, overexpression or deletion of E2F7 can be observed in many malignant tumors, and it can affect tumor differentiation, proliferation and metastasis by interacting with different downstream targets. E2F7 is abnormally expressed in glioma, [10,11] colon cancer [12][13][14] and breast cancer, [15,16] and has an important influence on the occurrence and development of a variety of tumors.
In view of this, we conducted this study to explore the expression of E2F7 in lung adenocarcinoma (LUAD) and analyze its correlation with clinical parameters, diagnostic and prognostic value of LUAD patients.

Patient data set
E2F7 mRNA expression data (including 594 samples, data format: FPKM) and clinical characteristics data are downloaded from the TCGA database. The data for pan-cancer analysis is from UCSC XENA (https://xenabrowser.net/datapages/). It is the Medicine  RNAseq data in TPM format of TCGA and GTEx that has been uniformly converted by the Toil process. Inclusion criteria: 1. Sufficient survival information; 2. Definite gene expression value. All our data come from public databases such as GEO and TCGA. The patients involved in the database have obtained ethical approval. Our research is based on open-source data and therefore does not require ethics committee approval for the study.

Statistical analysis
The median of E2F7 expression was selected as the critical value, and the Wilcoxon signed rank test was used to test the differential expression of E2F7 in LUAD and normal tissues, and the results were displayed by box plots. Wilcoxon rank sum test and Dunn's test were used to testing whether the expression of E2F7 is related to clinical features in LUAD. Table 2 Logistic analysis of the association between E2F7 expression and clinical characteristics.

Characteristics
Total (  Kaplan-Meier analysis was performed to compare the differences in OS and DSS between E2F7 high and low expression groups, and to draw survival curves. [17] The pROC package and the ggplot2 package are used to study the role of E2F7 prognosis and draw the ROC curve, where AUC represents the diagnostic value. Univariate Cox regression analysis was used to screen potential prognostic factors, and multivariate Cox regression was used to verify the independent predictive value of multiple indicators including E2F7 for prognosis. The rms package and survival package are used to draw nomograms to show the relationship between various variables and survival rates. The clusterProfiler package and the org.Hs.eg.db package are used for the enrichment analysis of GO and KEGG. [18] The clusterProfiler package and the ggplot2 package are used to perform GSEA enrichment analysis and plotting. In addition, we used an independent GEO data set (GSE50081) for external verification. The difference in the expression of E2F7 in pan-tumor and normal tissues is verified in UCSC XENA (https:// xenabrowser.net/datapages/) [19] and Timmer database (https:// cistrome.shinyapps.io/timmer/). All statistical analysis uses R statistical software (version 3.6.3).

Baseline characteristics of included patients
A total of 535 patients diagnosed with lung adenocarcinoma were included in this study, and the data of these patients were all obtained through the TCGA data portal. The detailed clinical characteristics are shown in Table 1. Among the included   13, and 4, respectively, and there was no significant difference between the groups (P = .186). In gender (P = .041), number pack year smoked (P = .018), M stage (P = .034) and OS event (P = .041), there are significant differences between the 2 groups.

High expression of E2F7 in LUAD
We compared the expression levels of E2F7 in LUAD and normal lung tissues. Taking the median of the gene expression level of CCNA2 as the cutoff value, the patients were divided into high expression group and low expression group. The results of the study on unpaired samples showed that the expression of E2F7 in LUAD was higher than that of normal lung tissue (P = 1e-34) (Fig. 1A). In the paired samples of LUAD and normal lung tissue, this conclusion was verified. (P = 2.7e-10) (Fig. 1B).

E2F7 expression and clinical characteristics
The logistic regression analysis results of the correlation between E2F7 expression level and clinical characteristics are summarized in Table 2. The high expression of E2F7 was significantly correlated with gender (P = .034), pathologic stage (P = .046) and M stage (P = .025). As shown in Figure 2, the Mann-Whitney U test results verify the correlation between E2F7 expression and gender (P = .029) and the number pack-years smoked (P = .002).
The results of multiple hypothesis test (Dunn's test) using Bonferroni method to correct the significance level show that the difference between SD and PD (P.adj = .037), CR and PD (P. adj = .001) was statistically significant. The same result appeared in the comparison of tumor and normal tissue (P < .001).
Due to its high diagnostic value, we combined E2F7 with clinical variables widely considered to be related to prognosis to construct a nomogram to predict the 1-, 3-, and 5-year survival probability (Fig. 6).

E2F7 related signal pathways
We performed GO/KEGG enrichment analysis on E2F7. Under the conditions of P.adj < 0.1 and q value<0.2, there are 6 BPs, 12 CCs, 1 MF, and KEGG 2 signal pathways (Table 4).
We performed GSEA on the data set of high and low expression of E2F7 to determine the differentially activated signaling pathways in LUAD. A total of 39 data sets satisfy FDR (q value) <0.25 and P.adjust < 0.05. Cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase and other pathway-related genes showed enrichment in the high E2F7 expression group (Fig. 7).

Verification through other independent external databases
We used an independent GEO dataset (GSE50081) containing 127 LUAD patients to further verify the above results. The  results of the Kaplan-Meier survival analysis confirmed the prognostic value of E2F7 for LUAD patients (Fig. 8A-C). We used the Timmer database to perform pan-tumor E2F7 expression analysis and showed that E2F7 is highly expressed in a variety of solid tumors including LUAD (Fig. 9A). We also integrated the pan-tumor analysis of the two databases of TCGA and GTEx and reached similar conclusions (Fig. 9B).

Discussion
In our study, the expression of E2F7 in many tumors including LUAD was higher than normal, and its expression level was higher in men and patients greater than 40 number pack-year, and it was related to the primary therapy outcome of disease, that is It is said that patients with the progressive disease have higher expression of E2F7. Subsequent survival analysis also showed that high expression of E2F7 is an independent risk factor for OS, and it has a high diagnostic value. This provides a basis for E2F7 to judge the prognosis of LUAD patients in future clinical work. Genes related to cell cycle checkpoints, DNA damage, telomere stress-induced senescence, DNA methylation, chromosome maintenance, and mitogenic pathways showed significant enrichment in the E2F7 high expression group, suggesting that E2F7 affects lung adenocarcinoma The potential mechanism of occurrence and development provides an important reference for further exploration of its mechanism through experiments in the future.
The occurrence and development of malignant tumors is a complex process involving multiple genes and their expressed proteins. Transcription is the beginning of gene expression and is strictly regulated by transcription factors (TFs) and its cofactors, RNA polymerase, and chromatin-modifying proteins. [6]  E2Fs are an important family of transcription factors, which have been confirmed to be involved in the process of cell proliferation, [20][21][22][23] differentiation, [24][25][26] apoptosis, [27][28][29][30] cycle regulation [31,32] and DNA damage response. [33,34] So far, a total of 8 family members have been discovered (E2F1-E2F8). According to their different functions, E2Fs are divided into transcription activators (E2F1-3) and transcription repressors (E2F4-8), and according to their structure, they are divided into typical E2Fs (E2F1-6) and atypical E2Fs (E2F7-8). The clinical value of many E2Fs members in the diagnosis and treatment of many solid tumors has been affirmed. [35][36][37][38] E2F7 is different from the typical E2Fs members in that it binds to DNA in a non-DP protein way to play a transcriptional inhibitory effect. [4,39] Studies have shown that E2F7 can inhibit cell proliferation by inhibiting the transcription of proliferation-related miRNAs. [40] However, in recent years, more and more studies have shown that E2F7 plays a role in promoting tumor occurrence and development in tumors. Chu et al. reported that the overexpression of E2F7 in breast cancer can inhibit miR-15a/16 transcription, cause Cyclin E1 and Bcl-2 to participate in tumor invasion and metastasis, and increase the resistance of breast cancer cells to tamoxifen. [15]   In our study, the expression of E2F7 in a variety of solid tumors was analyzed through the Timmer database and UCSC XENA. The results showed that E2F7 is highly expressed in LUAD, lung squamous cell carcinoma (LUSC), esophageal squamous cell carcinoma (ESCA) and other solid tumors.
In previous existing studies, there is no content about the prognostic value of E2F7 expression in LUAD patients. In this study, the diagnostic value of E2F7 was analyzed on the TCGA database by means of bioinformatics analysis. Kaplan-Meier survival analysis showed that high expression of E2F7 was associated with shorter OS and DSS, and this conclusion was verified in the GEO dataset. Multivariate Cox analysis further confirmed that the expression of E2F7 is independently related to OS of patients with LUAD. Other clinical features, such as local advanced stage, lymph node metastasis, distant metastasis, later TNM staging, and the degree of surgical resection are closely related, and are also related to poor prognosis. We further constructed a nomogram of the prognosis of LUAD patients based on clinical variables and the expression of E2F7, which provided a basis for clinicians to predict the survival rate of individual patients.
The mechanism by which E2F7 mediates the development of tumors is not completely clear. It may promote tumor proliferation, differentiation, infiltration and metastasis through the following methods: (1) E2F7 up-regulates Beclin-1 and mediates autophagy induced by miR-129 Trigger autophagy flux [10] ; (2) E2F7 increases the expression level of vimentin, reduces the expression of E-cadherin protein, and promotes the EMT process [41][42][43] ; (3) As the transcriptional activators of VEGFA, E2F7 cooperates with HIF-1α to induce the transcription of VEGFA and promote blood vessel Generation [44] ; (4) Induce the transcription of collagen and calcium-binding domains and Flt to promote the generation of lymphatic vessels. [35,45] Our study found that the expression of E2f7 is related to pathways such as cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase. Our research results are related to the above-mentioned mechanisms, but these mechanisms need further research to confirm.
Although our study provides a new method to explore the relationship between E2F7 and the prognosis of lung adenocarcinoma, it still has many limitations. First of all, although we have adopted the GEO database to verify the results of the TCGA database analysis, the study object is still only patients in the public database, which will lead to bias. Secondly, due to the limited sample size and clinical indicator, our research conclusions need to be further confirmed by a large sample of research. Finally, we need further experiments to explore the role of E2F7 in tumor progression and its mechanism of affecting tumors.
In short, high expression of E2F7 is an independent risk factor for OS in LUAD patients, and has a high diagnostic value. cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase may be the key pathways through which LUAD is regulated by E2F7.